feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322
feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322yydawx wants to merge 4 commits into
Conversation
|
因为验证一下非常繁琐,所以我提供一个agent生成的Guide,如有问题可以随时沟通: 在 hvisor 上运行 Asterinas(x86_64 QEMU)概述本文档说明如何在 hvisor 上通过 Multiboot2 协议启动 Asterinas 内核作为 zone1 虚拟机。 测试版本:
Asterinas 内核要求Asterinas 内核编译参数: 其中:
注意:Asterinas 不需要我们的内核修改即可在 hvisor 下启动。可选的 hvisor 侧修改(ccf-asterinas 分支)共 4 个提交,基于上游 d3260d0:
hvisor-tool 侧修改(ccf-asterinas 分支)共 4 个提交,基于上游 b45971a:
宿主机环境内存布局Asterinas zone1 8 GB 示例(非连续 EPT 区域,绕过 ECAM 空洞): 总计约 8 GB,分散在 EPT 的 5 个 RAM 区域中。 编译和部署完整构建1. 构建 Asterinas(需要 Docker 容器): 2. 构建 hvisor: 3. 构建 hvisor daemon: 4. 部署到 rootfs: 快速重建(仅 hvisor)快速重建(仅 daemon)运行启动 hvisor
在 zone0(根 Linux)中启动服务和 zone1
在 zone1(Asterinas)中操作
配置文件说明zone1-asterinas.json定义 zone1 的内存区域、CPU、内核路径、initramfs 和 Multiboot2 参数。 关键字段:
已知问题
|
|
目前我发现给asterinas配置virtio需要修改asterinas的源码,这个可以接受吗?还是我们需要想一个更好的办法。 |
I also encountered this problem when configuring virtio, and I think it is acceptable to make a few changes to Asterinas. |
| let zone = this_zone_arc.read(); | ||
| // The guest IOAPIC RTE may route to a CPU outside this zone. | ||
| // If so, redirect to the zone's first CPU so the interrupt | ||
| // reaches the correct guest. |
There was a problem hiding this comment.
I have added cpu redirect fix in function VirtIoApic::write() Line136-142 in the last commit, you could remove redundant fixes.
|
|
||
| /// Walk guest page tables for virtual address `vaddr` using CR3 as the PML4 base. | ||
| /// Prints the full page table hierarchy for debugging. | ||
| fn walk_guest_page_table(vaddr: usize, cr3_gpa: usize) { |
There was a problem hiding this comment.
You could reuse function gva_to_gpa() in mmio.rs for page walking.
| pub name: [u8; CONFIG_NAME_MAXLEN], | ||
| // Multiboot support (NEW) | ||
| pub multiboot_info_paddr: u64, | ||
| pub multiboot_enabled: u32, |
There was a problem hiding this comment.
Consider putting multiboot_info_paddr and multiboot_enabled inside arch_config, since they are x86 specific configs, which shall not be shared by other archs.
| Some(zone_arc) => { | ||
| let target_cpu = get_target_cpu(irq_id as _, target_zone as _); | ||
| // Verify target_cpu belongs to target_zone. | ||
| // The guest IOAPIC may route IRQs to an APIC ID that now | ||
| // belongs to a different zone, which would cause the IRQ | ||
| // to be injected into the wrong guest. | ||
| let zone = zone_arc.read(); | ||
| if zone.cpu_set.bitmap & (1u64 << target_cpu) != 0 { | ||
| target_cpu | ||
| } else { | ||
| trace!("virtio: IRQ {} for zone {} routed to CPU {} outside zone, falling back to CPU {}", | ||
| irq_id, target_zone, target_cpu, | ||
| zone.cpu_set.first_cpu().unwrap()); | ||
| zone.cpu_set.first_cpu().unwrap() | ||
| } | ||
| } |
There was a problem hiding this comment.
Another redundant IOAPIC redirect fix which should be removed. By the way, we shall avoid adding arch-specific contents into codes and files shared by all archs.
There was a problem hiding this comment.
The IOAPIC redirect in VirtIoApic::write() only takes effect when the guest actively reconfigures IOAPIC entries, but the initial RTE state is inherited from zone0 on zone1 startup and may point to CPUs outside zone1. Without the fallback In handle_hvc_finish_req, virtio IRQs are delivered to the wrong guest. Tested: removing this breaks virtio console input.
| } | ||
| IdtVector::I8042_KEYBOARD_VECTOR => {} | ||
| IdtVector::APIC_SPURIOUS_VECTOR | IdtVector::APIC_ERROR_VECTOR => {} | ||
| _ => { | ||
| if vector >= 0x20 && this_cpu_data().arch_cpu.power_on { | ||
| inject_vector(this_cpu_id(), vector, None, false); | ||
| IdtVector::APIC_SPURIOUS_VECTOR | ||
| | IdtVector::APIC_ERROR_VECTOR => {} | ||
| // programmed the LAPIC. They belong to the CURRENT zone, | ||
| // not zone0. Device interrupts (0x20-0xdf) always belong to | ||
| // zone0 and must be forwarded if they arrive on a non-zone0 CPU. | ||
| // Check if this is a LAPIC-local interrupt. | ||
| // The guest's timer vector is dynamically allocated and may be < 0xe0, | ||
| // so we also check against the tracked LAPIC timer vector. | ||
| let is_lapic_local = vector >= 0xe0 | ||
| || vector == this_cpu_data().arch_cpu.virt_lapic.virt_timer_vector as u8; | ||
| if zone_id == 0 || is_lapic_local { | ||
| inject_vector(cpu_id, vector, None, false); | ||
| } else { | ||
| // Forward device interrupt to zone0. | ||
| let zone0 = crate::zone::find_zone(0).unwrap(); | ||
| let zone0_cpu = zone0.read().cpu_set.first_cpu().unwrap_or(0); | ||
| inject_vector(zone0_cpu, vector, None, false); | ||
| } |
There was a problem hiding this comment.
non-root zones should also be able to receive real-hardware-injected vectors. Sometimes we may let zone1 use real devices instead of virtio devices.
Solicey
left a comment
There was a problem hiding this comment.
I suggest making as minimum changes as possible to achieve booting Asterinas. You can take a look at my previous commit to learn what had already been fixed, so that you do not need to add redundant fixes in your pr.
| /// When a non-root zone starts on a set of CPUs, ensure critical physical | ||
| /// interrupts (UART, etc.) are not routed to those CPUs. If they are, re-route | ||
| /// them to CPU 0 which stays in the root zone. Without this, zone0 can become | ||
| /// unresponsive because physical interrupts get injected into a guest that has | ||
| /// no handler for them. | ||
| pub fn ioapic_reroute_from_cpus(cpu_set: &crate::cpu_data::CpuSet) { | ||
| // Critical IRQs that the root zone needs for interactive console. | ||
| const CRITICAL_IRQS: &[u8] = &[irqs::UART_COM1_IRQ]; | ||
|
|
||
| let mut io_apic = IO_APIC.lock(); | ||
| for &irq in CRITICAL_IRQS { | ||
| // table_entry returns RedirectionTableEntry, transmute to u64 for | ||
| // bit-field manipulation. | ||
| let entry = unsafe { io_apic.table_entry(irq) }; | ||
| let raw: u64 = unsafe { core::mem::transmute(entry) }; | ||
| let dest_apic_id = raw.get_bits(56..=63) as usize; | ||
| let dest_cpu = get_cpu_id(dest_apic_id); | ||
| if cpu_set.bitmap & (1u64 << dest_cpu) != 0 { | ||
| // Re-route to CPU 0 which is always in the root zone. | ||
| let cpu0_apic_id = get_apic_id(0) as u64; | ||
| let mut new_raw = raw; | ||
| new_raw.set_bits(56..=63, cpu0_apic_id); | ||
| let new_entry = unsafe { core::mem::transmute(new_raw) }; | ||
| unsafe { io_apic.set_table_entry(irq, new_entry) }; | ||
| warn!( | ||
| "ioapic: rerouted IRQ {} from CPU {} (APIC {:#x}) to CPU 0 (APIC {:#x})", | ||
| irq, dest_cpu, dest_apic_id, cpu0_apic_id | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I don't think it necessary to handle ioapic reroute. As mentioned earlier, this issue has been fixed in my last commit. You can make your own modifications based on my fix, but please avoid fixing the same problems with redundant codes.
|
Most redundant code is because some problems when booting. But some of them may not work indeed. I will find out which part is useless. Thanks for your review! |
138a5b2 to
a589ab3
Compare
0cb3ac4 to
dd456e1
Compare
|
@Solicey Hi!I removed most redundant codes and debugs/comments. I also aviod hard -coding |
- Add mb2_boot.S bootloader for 16-bit to 32-bit mode transition - Bootloader sets up GDT with TSS and jumps to kernel entry - Pass kernel entry via ESI to bootloader on VM entry - Add multiboot_info_paddr/multiboot_enabled to HvArchZoneConfig - Remove unused print_memory_map - Add v_bus/v_device/v_function to HvPciDevConfig
- Add S2PT violation handler via MMIO dispatch - Add GS_BASE/FS_BASE MSR read/write support
- Add NULL guard for VIRTIO_BRIDGE res_agent
- Adjust zone0 memory layout for zone1 coexistence - Update virtio configuration for multi-zone setup
|
@Solicey multiboot2 is ready now. I forgot to mention for review :) btw: it seems that asterinas will make out a bzImage type kernel when legacy-32 parameter is on. But we need setup.bin and vmlinux.bin right? |
Summary
Adds Multiboot2 protocol support to boot Asterinas OS as a zone1 guest, using a minimal ASM bootloader. Minimal changes to core code — all x86-specific logic stays under arch/x86_64/.
Changes
Multiboot2 Boot Support (Commit 1: feat)
New mb2_boot.S bootloader: 16→32-bit transition with GDT + TSS setup, jumps to kernel entry
Loaded via boot_filepath in zone1 config, with GPA→HPA offset translation in hvisor-tool
ELF segment loading with kernel_entry_gpa passed to bootloader via ESI
multiboot_info_paddr/multiboot_enabled added to HvArchZoneConfig (x86-specific)
Multiboot path gated behind multiboot_enabled flag — Linux zone1 paths unaffected
Removed unused print_memory_map
Exception Handling (Commit 2: fix)
S2PT (EPT) violation handler via MMIO dispatch
GS_BASE/FS_BASE MSR read/write support for 64-bit guests
x2APIC MSR fallback for unrecognized registers in x2APIC range
TSC frequency reporting via CPUID
Virtio Robustness (Commit 3: fix)
NULL guard for VIRTIO_BRIDGE.res_agent() — returns gracefully instead of panic
Struct & Config Fixes (Commit 4: feat)
Added v_bus/v_device/v_function to HvPciDevConfig to match C side (fixes 128-byte zone_config size mismatch)
Bumped CONFIG_MAGIC_VERSION to 0x7 on both C and Rust sides
Zone0 memory layout and virtio config adjustments for zone1 coexistence
Example zone1 config: zone1-asterinas.json
Requires
yydawx/hvisor-tool#98 — Multiboot2 loading with GPA→HPA offset translation